Goto

Collaborating Authors

 openai clip


Meta CLIP 2: A Worldwide Scaling Recipe

arXiv.org Artificial Intelligence

Contrastive Language-Image Pretraining (CLIP) is a popular foundation model, supporting from zero-shot classification, retrieval to encoders for multimodal large language models (MLLMs). Although CLIP is successfully trained on billion-scale image-text pairs from the English world, scaling CLIP's training further to learning from the worldwide web data is still challenging: (1) no curation method is available to handle data points from non-English world; (2) the English performance from existing multilingual CLIP is worse than its English-only counterpart, i.e., "curse of multilinguality" that is common in LLMs. Here, we present Meta CLIP 2, the first recipe training CLIP from scratch on worldwide web-scale image-text pairs. To generalize our findings, we conduct rigorous ablations with minimal changes that are necessary to address the above challenges and present a recipe enabling mutual benefits from English and non-English world data. In zero-shot ImageNet classification, Meta CLIP 2 ViT-H/14 surpasses its English-only counterpart by 0.8% and mSigLIP by 0.7%, and surprisingly sets new state-of-the-art without system-level confounding factors (e.g., translation, bespoke architecture changes) on multilingual benchmarks, such as CVQA with 57.4%, Babel-ImageNet with 50.2% and XM3600 with 64.3% on image-to-text retrieval.


It Happened One Frame: incredibly accurate video content search with OpenAI CLIP

#artificialintelligence

I love movies, so as a fun exercise for my fast.ai It's named "It Happened One Frame", in tribute to the classic 1934 romantic comedy "It Happened One Night". To use this app, all you need is the link to a Youtube video. For example, you could search "Macaulay Culkin screams with hands on his cheeks" in a Home Alone movie clip and get the screenshots that capture the most iconic scene in this classic. This particular image is so popular that you can easily get it from a google search.


Pinaki Laskar on LinkedIn: #artificialintelligence #machinelearning #deeplearning

#artificialintelligence

AI Researcher, Cognitive Technologist Inventor - AI Thinking, Think Chain Innovator - AIOT, XAI, Autonomous Cars, IIOT Founder Fisheyebox Spatial Computing Savant, Transformative Leader, Industry X.0 Practitioner At what stage of development are #artificialintelligence and #machinelearning now? We're living exciting times in the Narrow AI of Statistic ML/DL to be replaced by the Causal AI/ML/DL. Are there any new breakthrough results? OpenAI shocked the world a year ago with GPT-3. Google presented LaMDA and MUM, two AIs that will revolutionize chat-bots and the search engine, respectively.


Wu Dao 2.0 - Bigger, Stronger, Faster AI From China

#artificialintelligence

It is no secret that China has COVID-19 under control. When you travel there you need to go through a 2-week hotel quarantine but once you are in the country, you are safe. Probably even safer than before COVID as wearing a mask is now part of the etiquette, and the many other viral respiratory diseases are likely to be on the decline. Hence, when I got invited to speak at the annual conference of the Beijing Academy of Artificial Intelligence (BAAI) in the AI for healthcare section, I readily accepted. The BAAI is a great platform for showcasing technology and talent across broad categories.


openai/CLIP

#artificialintelligence

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet "zero-shot" without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision. First, install PyTorch 1.7.1 and torchvision, as well as small additional dependencies, and then install this repo as a Python package. Returns the model and the TorchVision transform needed by the model, specified by the model name returned by clip.available_models(). The name argument can also be a path to a local checkpoint.